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Abstract 



We discuss schemes for exact and approximate computations of permanents, and compare 
them with each other. Specifically, we analyze the Belief Propagation (BP) approach and 
\& its Fractional BP generalization for computing the permanent of a non-negative matrix. 

Known bounds and conjectures are verified in experiments, and some new theoretical re- 
lations, bounds and conjectures are proposed. The fractional Free Energy (FE) functional 
is parameterized by a scalar parameter 7 G [—1; 1], where 7 = — 1 corresponds to the BP 
Q limit and 7 = 1 corresponds to the exclusion principle (but ignoring perfect matching con- 

^ straints) Mean-Field (MF) limit, and shows monotonicity and continuity of the functional 

O with 7. We observe that the special value of 7, where the 7-parameterized FE functional is 

equal to the exact FE (defined as the minus log of the permanent), lies in the [—1; 0] range, 
£C) with the low and high values from the range producing provable lower and upper bounds 

^ for the permanent. Our experimental analysis suggests that the special 7 varies for differ- 

ent ensembles but that it always lies in the [—1; —1/2] interval. Besides, for all ensembles 
considered the behavior of the special 7 is highly distinctive, offering a practical potential 
for estimating permanents of non-negative matrices via the fractional FE approach. 

qq Keywords: Permanent, Graphical Models, Belief Propagation, Exact and Approximate 

Algorithms, Learning Flow Models 

• • 

> 1. Introduction 

This work is motivated by computational challenges associated with learning stochastic 
flows from two consecutive snapshots/images of n identical particles immersed in a flow 



Chertkov et al. (2008); Chertkov et al. (2010). The task of learning consists in maximizing 



permanent of the n x n matrix, with elements constructed of probabilities for a particle in 
the first image to correspond to a particle in the second image, over the low-dimensional 
parametrization of the reconstructed flow. The permanents in this enabling application are 
nothing but a weighted number of perfect matchings relating particles in the two images. 

Inspired by this "learning the flow" application, we continue in this manuscript the 



thread of |Watanabe and Chertkov (2010) and focus on computations of positive perma- 
nents of non-negative matrices constructed from probabilities. Exact computation of the 
permanent is difficult, i.e. it is a problem of likely exponential complexity, with the fastest 
known general algorithm for computing the permanent of a full n x n matrix based on 



Ryser's formula ( |Ryser[ 1963) requiring 0(n2 n ) operations. In fact, the task of computing 
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the permanent of a non-negative matrix was one of the first problems established to be in 



the #-P (sharp-P) complexity class, which is also complete in the class Valiant (1979). 

Therefore, recent efforts have mainly focused on developing approximate algorithms. 
Three independent developments, associated with the mathematics of strict bounds, Monte- 
Carlo sampling, and Graphical Models, contributed to this field. 

On the "mathematics of permanents" side, the emphasis was on establishing rigorous 
lower and upper bounds for permanents. Many significant results in this line of research 
are related to the van der Waerden ( van der Waerden] |1926 ) conjecture that the minimum 
of the permanent over doubly stochastic matrices is n!/n n , and it is only attained when 
all entries of the matrix are 1/n. The conjecture remained open for over 50 years before 



Falikman (Falikman, 1981) and Egorychev (Egorychev, 1981) proved it. Recently, Gurvits 



(Gurvits, 2008) found an alternative, surprisingly short and elegant proof that also allowed 
for a number of unexpected extensions. (See e.g. the discussion of Laurent and Schrijver 
( [20101 ).) 



On the "Monte-Carlo sampling" side, a very significant breakthrough was achieved with 
the invention of the Fully Polynomial Randomized Algorithmic schemes (FPRAS) for the 



permanent problem Jerrum et al. (2004): the permanent is approximated in polynomial 
time, provably with high probability and within an arbitrarily small relative error. The 
complexity of the FPRAS of Jerrum et al. (2004) is 0(n n ) in the general case. Even 



though the scaling was improved to 0(n 4 log n) in the case of very dense matrices Huber 



and Law ( 2008[ ), the approach is still impractical for majority of realistic applications 

On the "Graphical Model" (GM) side, Belief Propagation (BP) heuristics showed sur- 
prisingly good performance |Chertkov et al. ( |2008| ) ; |Huang and Jeba"ra ( 2009 ) ; Chertkov 
et al. (2010). The BP family of algorithms, originally introduced in the context of error- 



correction codes by Gallager (Gallager, 1963), artificial intelligence by Pearl ( Pearl[ 



and related to some early theoretical work in statistical physics by Bethe (Bethe 



and Peierls (Peierls, 1936) on tree graphs, can generally be stated for any GM Yedidia et al. 



1988 



1935 



(2005). The exactness of the BP on any tree, i.e. on a graph without loops, suggests that 



the algorithm can be an efficient heuristic for evaluating the partition function or for find- 
ing a Maximum Likelihood (ML) solution for the graphical model (GM) defined on sparse 
graphs. However, in the general loopy cases, one would normally not expect BP to work 



very well, making the heuristic results of Chertkov et al. (2008); Huang and Jebara (2009); 



Chertkov et al. (2010) somehow surprising, even though not completely unexpected in view 



of the existence of polynomially efficient algorithms for the ML version of the problem 



Kuhn| (|1955j); |Bertsekas| Ql992| ), which were shown by |Bayati et aL Q2008| ) to be equivalent 
to an iterative algorithm of the BP type. This raises questions about understanding the 



performance of BP. To address this challenge Watanabe and Chertkov (2010) established 
a theoretical link between the exact permanent and its BP approximation. The perma- 
nent of the original non-negative matrix was expressed as a product of terms, including 
the BP-estimate and another permanent of an auxiliary matrix, 0. * (1 — 0) Q where 
is the doubly stochastic matrix of marginal probabilities of links between particles in the 
two images (edges in the underlying GM) calculated using the BP approach. (See Theorem 
3l) The exact relation of Watanabe and Chertkov (2010) followed from the general Loop 



1. Here and below we will follow Matlab notations for the component- wise operations on matrices, such as 
A. * B for the component- wise, Hadamard, product of matrices A and B. 
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Calculus technique of Chertkov and Chernyak (Chertkov and Chernyak, 2006a|b ), but it 
also allowed a simple direct derivation. Combining this exact relation with aforementioned 
results from the mathematics of permanents led to new lower and upper bounds for the 
original permanent. Moreover this link between the math side and the GM side gained a 



new level of prominence with the recent proof by Gurvits (Gurvits, 2011) of the fact that 



the variational formulation of BP in terms of the Bethe FE functional, discussed earlier in 



Chertkov et al. (2008); Chertkov et al. (2010); Watanabe and Chertkov (2010), and shown 



to be convex by Vontobel ( Vontobel, 2011 ), gives a provable lower bound to the permanent. 
Remarkably, this proof of Gurvits was based on an inequality suggested earlier by Schrijver 
(Schrijver , 1998) for the object naturally entering the exact BP formulas, perm(/3. * (1 — /?)). 



This manuscript contributes two-fold, theoretically and experimentally, to the new syn- 
ergy developing in the field. Theory-wise, we generalize the BP approach to computing 
permanents, suggesting replacing the Bethe FE functional by its fractional generalization 



in the general spirit of |Wiegerinck and Heskes (2003) differing from the Bethe FE functional 
Yedidia et al. (2005) in the entropy term, and then derive new exact relations between the 
original permanent and the results of the fractional approach. (See Theorem 12 ) The new 
object, naturally appearing in the theory, is perm(/3. * (1 — /3).~ 7 ), where 7 E [—1; 1], with 
7 = — 1 corresponding to BP and 7=1 corresponding to the so-called exclusion principle 
(Fermi), but ignoring perfect matching constraints, Mean Field (MF) approximation dis- 



Chertkov et al. ( 


2008 


)• 


particular from Gurvits 



mation, the fractional estimate of the permanent is a monotonic continuous function of the 
parameter 7 with 7 = — 1 and 7 = setting the lower bound (achievable on trees) and an 
upper bound respectively. We also analyze existing and derive new lower and upper bounds. 
We adopt for our numerical experiments the so-called Zero-suppressed binary Decision Di- 
agrams (ZDDs) approach of |Minato| ( |1993| ) (see e.g. Knuth ( |2009| )), which outperforms 
Ryser's formula for realistic (sparsified) matrices, for exactly evaluating permanents, de- 
velop numerical schemes for efficiently evaluating the fractional generalizations of BP, test 
the aforementioned lower and upper bounds for different ensembles of matrices and study 
the special 7, corresponding to the case where the fractional estimation coincides with the 
exact expression for the permanent 

The material in the manuscript is organized as follows: the technical introduction, 
stating computation of the permanent as a Graphical Model, is explained in Section [2] 
and Appendix |Aj The BP-based optimization formulations, approximate methods, iterative 
algorithms and related exact formulas are discussed in Section [3] and Appendices |B||C|D|E 



Section [4] is devoted to the permanental inequalities, discussing the special values of 7 
and conjectures. Our numerical experiments are presented and discussed in Section [5] and 
Appendices F|G|H We conclude and discuss the path forward in Section |6j 



2. Note that methodologically similar approach, of searching for the best/special fractional coefficient, was 
already discussed in the literature by Cseke and Heskes (Cseke and Heskes 2011 ) on an example of the 
Gaussian BP. 
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Figure 1: Illustration of Graphical Model for perfect matchings and permanent. 



2. Technical Introduction 

The permanent of a square matrix p, with non- negative entries denoted pij, p = (pij\i,j 
1, • • • , n), is defined as 



perm(» = J] 



where S n is the set of all permutations of the {1, • • • , n} set. 

An example of a physics problem where computations of permanents are important is 
given by particle tracking experiments and measurements techniques, of the type discussed 
in |Chertkov et aL ( 2008| ); |Chertkov et al. (2010). In this case an element of the matrix, 



p = (pij\i,j = 1, • • • , n), is interpreted as an unnormalized probability that the particle 
labeled i in the first image moves to the position labeled j in the second image. In its most 
general formulation, the task of learning a low dimensional parametrization of the flow from 
two consecutive snapshots consists of maximizing the partition function Z = permfjp) over 
the "macroscopic" flow parameters affecting p. Computing the permanent for a given set 
of values of the parameters constitutes an important subtask, the one we are focusing on in 
this manuscript. 

2.1 Computation of the Permanent as a Graphical Model Problem 

The permanent of a matrix can be interpreted as the partition function Z of a Graphical 
Model (GM) defined over a bipartite undirected graph, Q = (V = (Vi, V2),£), where Vi, V2 
are of equal size, |Vi| = IV2I = n, and Vi, V2, and £ stand for the set of n vertices/labels 
of particles in the first and second images and the set of edges (possible relations) between 
particles in the two images respectively. The basic binary variables, aij — 0, 1, are associated 
with edges, while the perfect matchings are enforced via the constraints associated with 
vertexes, Vi E V\ : Y2jeV2 ai 3 ~ ^ an< ^ ^ ^ ^ 2 : SieVi Gi 3 = 1, as illustrated in Fig. [TJ 
A non-negative element of the matrix, p^-, turns into the weight associated with the edge 
In summary, the GM relates the following probability to any of n\ perfect matchings, 
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a: 



(a) = Z- 1 J] (py) 



(1) 



0,1 



€ £; Vi € Vi : E ^ = 1; Vj € V 2 : E ^ = 1 j 

jev 2 ieVi / 



Z(p) = perm(p) = E JJ (py)* 4 '. 



(2) 



The GM formulation ([!]) also suggests a variational, Kublack-Leibler (KL) scheme for 
computing the permanent. The only minimum of the so-called exact FE functional, 



F(b\p) = Y,b(a)logl ^ 



b(a) 



(3) 



computed under the normalization condition, J2 a b(a) = 1, is achieved at 6(a) = £>(a), and 
the value of the exact FE functional at the minimum over b(a) is — log(Z(p)). Here, general 
and optimal 6(a) are interpreted as the respective proxy and the probability of the perfect 
matching a. 

The relation between the problem of computing the permanent and the problem of 
finding the most probable (maximum) perfect matching is discussed in Appendix [Aj 

2.2 Exact Methods for Computing Permanents 

Computing the permanent of a matrix is a #-P hard problem, i.e. it is a problem which most 
likely requires a number of operations exponential in the size of the matrix. In Appendix 
|H} we experiment and compare the performance of the following two exact deterministic 
ways to evaluate permanents: 

• A general method based on Zero-supressed binary Decision Diagrams (ZDDs), ex- 
plained in more detail in 



Knuth 



Knuth 



(2009). See also detailed explanations below in 



(2009), the ZDDs may be a very efficient practical 



Appendix [Gj As argued in 
tool for computing partition functions in general graphical models. This thesis was 



illustrated by A. Yedidia (Yedidia, 2009) on the example of counting independent sets 
and kernels of graphs. 

A permanent-specific method based on Ryser's formula: 



z(p) = (-ir E (-i) |s| IIE^- 

5C{l,-,n} i=ljeS 

We use code from |TheCodeProject| implementing the Ryser's formula. 

Note that in most practical cases many entries of p are very small and they do not affect 
the permanent of p significantly. These entries do, however, take computational resources 
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if accounted for in the algorithm. To make computations efficient we sparsify the resulting 
matrix implementing the heuristic pruning technique explained in Appendix [FJ 

We also verify some of our results against randomized computations of the permanent 



using the FPRAS from | Jerrum et aL ( 2004| ), with a specific implementation from Chertkov 
et al.l (120081). 



3. Approximate Methods and Exact Relations 

We perform an approximate computation of the permanent by following the general Bethe 



FE approach Yedidia et al. (2005) and the associated Belief Propagation/Bethe-Peierls (BP) 



algorithm, discussed in detail for the case of permanents of positive matrices in Chertkov 
et al.| Q2008D ; |Huang and Jebara| (|2009[); |Chertkov et al.| fl2010p. (S ee also Appendix [B 
reproducing the description of Chertkov et al. (2008); Chertkov et al.| ( [2010 ) and presented 
in this manuscript for convenience.) In our BP experiments we implement the algorithm 



discussed in Chertkov et al. (2008) with a special type of initialization corresponding to the 



best perfect matching of p. We also generalize the BP scheme by modifying the entropy 
term in the Bethe FE. 

In the following subsections we re-introduce the Bethe FE approach, the related but 
different Mean Field FE approach, and also consider a fractional FE approach generalizing 
and interpolating between Bethe/BP and MF approaches. Even though these optimization 
approaches and respective algorithms can be thought of as approximating the permanent 
we will show that they also generate some exact relations for the permanent. 



3.1 Belief Propagation/Bethe-Peierls Approach 

Let us start by defining some useful notations. 

Definition 1 (/3-polytope) Call the /3-polytope of the non-negative matrix p (or just f3- 
polytope for short) the set of doubly stochastic non-negative matrices with elements cor- 
responding to zero elements of p equal to zero, B p — (Aj|Vz : j)eS Pij = ^'^3 '• 
j)e£ fiij — l?Vpij = : fiij — 0). We say that (3 lies in the interior of the [3-polytope, 

f3eBi int) , ifVpij^O: /%^0,1. 

In English, the interior solution means that all elements of the doubly stochastic (3 are 
non-integer, unless element pij = 0, in which case f3ij = 0. 

Definition 2 (Bethe Free Energy (for the permanent)) The following functional of 



Fbp(P\p) = E (A* logOVpy)-(l-0y) log(l-0y)) 



(4) 



conditioned to a given p, is called the Bethe FE of the Belief- Propagation/Bethe-Peierls 
(BP) functional (for the permanent)^ 



3. In the following, and whenever Bethe, MF or fractional FE are mentioned, we will drop the clarifying 
"(for the permanent)" as only permanents are discussed in this manuscript. 
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To motivate the definition above let us briefly discuss the concept of the Bethe FE which 



was introduced in Yedidia et al. (2005) for the case of a general pair- wise interaction GM 



(with variables associated with vertices of the graph). Schematically, the logic leading to 
Eq. Q for the permanent, is as follows. (See |Watanabe and Chertkov (2010) for a detailed 
discussion.) Consider a GM with binary variables associated with edges of the graph. If 
the graph is a tree, then the following exact relation holds, p(a) — Y\ i pi{&i) / Y\(i j) Pij( a ij)i 
where U{ — (aij = 0, l|(z,j) E £). Pi{(Ji) and pijipij) are marginal probabilities asso- 
ciated with vertex i and edge (z, j) of the graph. Replacing the probabilities by their 
proxies/beliefs, p(a) — >> b(a), pi{(Ji) — > bi{(Ji) and pij{pij) biji&ij), substituting the ra- 
tio of probabilities expression for b(a) in the exact FE functional Q, and accounting for 
relations between the marginal beliefs, one arrives at the general expression for the Bethe 
FE functional. This expression for the Bethe FE functional is exact on a tree only, and 



was introduced in Yedidia et al. (2005) as an approximation for GM on a graph with loops. 



When the GM is of a perfect matching type and the graph is equally split bi-partite the 
BP replacement for b(a) becomes 



b B p(cr) 



n 



(i,j)eS:a i: 



=1 @ij 



11(1,^)65:^=0 C 1 ~ A 



(5) 



13) 



Then, substituting b(a) by bBp(cr) in Eq. ^ results in the Bethe FE expression @ for the 
perfect matchings (permanents). Note, that while the exact FE ^ is the sum of 0{n\) 
terms, there are only 0(n 2 ) terms in the Beth e FE Q. 



According to the Loop Calculus approach of Chertkov and Chernyak|(|2006a|[b ), extended 
to the case of the permanent in Chertkov et al. (2008); Chertkov et al. (2010); Watanabe 



and Chertkov (2010), the BP expression and the permanent are related to each other as 



follows: 



Theorem 3 (Permanent and BP, Watanabe and Chertkov (2010)) IftheBPequa 



tions 



V(i,j): (1-Wtj 



Pij 



(6) 



(where (3 is a doubly stochastic matrix and the multipliers, u — (ui\i E Vi){J(u l \i E V2) are 
positive-valued) have a solution in the interior of the /3-polytope, (3 E Bp Ut \ then 



Z = Z B p(p)perm(p.*(l-P)), 



(7) 



where Z BP (p) = -log(F BP (f3\p)). 

An iterative heuristic algorithm solving BP Eqs. ^ for the doubly stochastic (3 efficiently 
is discussed in Appendix |B.2| 

Let us recall that the element of the doubly stochastic /3, is interpreted as 

the proxy (approximation) to the marginal probability for the edge of the bipartite 

graph Q to be in a perfect matching, i.e. should be thought of as an approximation for 

Note also that log(i^) and log(^ J ) in Eqs. ^ are the Lagrange multipliers related to 
the 2n double stochasticity (equality) constraints on /3. 
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3.1.1 BP as the Minimum of the Bethe Free Energy 

Definition 4 (The minimum of the Bethe Free Energy) is defined as 

- log(Z - B p(p)) = F - B p(p) = jmnF B p(P\p), 



(8) 



where Fbp(0\p) is defined in Eq. Q). 



Considered in the general spirit of Yedidia et al. ([2005), F _bp(p), just defined, should 
be understood as an approximation to — log (perm (p )). To derive Eq. Q on e needs to 
replace b(a) by its lower parametric proxy (See |Watanabe and Chertkov| ( |20To| ) for 
more details.) 

The relation between the optimization formulation ([§]) and the BP Eqs. ^ requires 
some clarifications stated below in terms of the following two propositions. 

Proposition 5 (Partially Resolved BP Solutions) Any doubly stochastic f3 solving Eqs. 
^ and lying at the boundary of the (3 polytope, i.e. f3 E B p but f3 £ Bp Ut \ can be reduced 
by permutations of rows and columns of ft (and p respectively) to a block diagonal matrix, 
with one block consisting o/0, 1 elements only and corresponding to a partial perfect match- 
ing, and the other block having all elements strictly smaller than unity, and nonzero if the 
respective pij ^ 0. We call such a solution of the BP Eqs. |6p partially resolved solutions, 
emphasizing that a part of the solution forms a partial perfect matching, and any other 
perfect matching over this subset is excluded by the solution (in view of the probabilistic 
interpretation of (3). A doubly stochastic (3 corresponding to a full perfect matching is called 
a fully resolved solution of the BP Eqs. |^). 

Proof This statement follows directly from the double stochasticity of (3 and from the 
form of the BP Eqs. ([6]), and it was already discussed in Chertkov et al.M2008h; Watanabe 
and Chertkovl (120101) for the fully resolved case. ■ 



Proposition 6 (Minimum of the Bethe FE and BP equations) The minimum of the 
Bethe FE, F _bp(/3) over (3 E B p , can only be achieved at a solution of the BP Eqs. 

Proof This statement is an immediate consequence of the fact that Proposition [5] is valid 
for any p, and so continuous change in p (capable of covering all possible achievable p) can 
only result in an interior solution for the doubly stochastic (3 merging into a vertex of the 
/3-polytope, or emerging from the vertex, but never crossing an edge of the polytope at 
any other location but a vertex. Therefore, we can exclude the possibility of achieving the 
minimum of the Bethe FE anywhere but at an interior solution, partially resolved solution 
or a fully resolved solution (corresponding to a perfect matching) of the BP equations. ■ 



Note, that an example where the minimum in Eq. ([§]) is achieved at the boundary of 
the f3 — polytope (in fact, at the most probable perfect matching corner of the polytope) 



was discussed in Watanabe and Chertkov (2010). In this special case (of very low but finite 
temperatures) there exists no interior solution of the BP equations. 



Another useful and related statement, made recently in Vontobel (2011), is 
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Proposition 7 (Convexity of the Bethe FE, |Vontobel| fl201ip ) The Bethe FE (gp is 
a convex functional of (3 E B p . 

A few remarks are in order. First, the statement above is nontrivial as, considered naively, 
individual edge contributions in Eq. Q associated with the entropy term, fyj log f3ij — (1 — 
Pij) log(l — fyj), are not convex for fyj > 1/2, and the convexity is restored only due to the 
global (double stochasticity) condition. Second, the convexity means that if the optimal 
solution is not achieved at the boundary of B p , then either the solution is unique (general 
case) or the situation is degenerate and one finds a continuous family of solutions all giving 
the same value of the Bethe FE. The degeneracy means that p should be fine tuned to get 
into the situation, and addition of an almost any small (random) perturbation to p would 
remove the degeneracy. To illustrate how the degeneracy may occur, consider an example of 
a (2 x 2) matrix p with all elements equal to each other. We first observe that regardless of 
p for n = 2, the entropy contributions to the Bethe FE are identical to zero for any doubly 
stochastic (2 x 2) matrix, Y^(iJ)' 2 (fiij l°gAj — (1 — Aj)l°g(l — /%)) = 0- Moreover, the 
remaining, linear in /3, contribution to the Bethe FE (which is also called the self-energy 
in physics) turns into a constant for the special choice of p. Thus one finds that in this 
degenerate n — 2 case, 

^ 1-a a 

with any a E [0; 1], is a solution of Eqs. ^ also achieving the minimum of the Bethe FE. 
Creating any asymmetry between the four components of the (2 x 2) p will remove the 
degeneracy, moving the solution of Eqs. ^ achieving the minimum of the Bethe FE to one 
of the two perfect matchings, correspondent to a = and a = 1 respectively. It is clear 
that this special "double" degeneracy (first, cancellation of the entropy contribution, and 
then constancy of the self-energy term) will not appear at all if the doubly stochastic /3, 
solving Eqs. ^ in the n > 2 case has more than two nonzero components in every row 
and column. Combined with Proposition [7J this observation translates into the following 
statement. 

Corollary 8 (Uniqueness of interior BP solution) If n > 2 and an interior, (3 E 
Bp Ut \ solution of Eqs. ^ has more than two nonzero elements in every row and column, 
it is unique^ 

3.2 Mean-Field Approach 

We now define the Mean Field (MF) approximation. 

Definition 9 (Mean Field Free Energy) For (3 E B p , the MF FE is defined as 

Fmf(P\p) = ]T log(/%Mi) + (1 - N) log(l - Pij)) , (9) 



4. In the following, discussing an interior solution, (3 £ Bp nt \ we will also be assuming that it is not 
degenerate, i.e. n > 2 and the solution has more than two nonzero elements in every row and column. 
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and thus the MF equations, defining an internal minimum of Fmf(P\p) over ft £ Bp nt ^ are 

v <«' ef: ft ' = r^W (10) 

Let us make a historical and also motivational remark. MF is normally though of as an 
approximation ignoring correlations between variables. Then, the joint distribution function 
of a complex variable is expressed in terms of the product of marginal distributions of its 
components. In our case of the perfect matching GM over the bi-partite graph, the MF 
approximation constitutes the following substitution for the exact beliefs, 

K°) -> II M*«), (ii) 

into Eq. ([5]). Making the substitution and relating the marginal edge beliefs to f3 according 
to, V(i, j) E £ : bij(l) = fiij, bij(0) = 1 — one arrives at Eq. Because of how 
the perfect matching problem is defined, the two states of an individual variable, — 
and &ij — 1, are in the exclusion relation, and so one can also associate the special form 
of Eq. Q with the exclusion or Fermi- (for Fermi-statistics of physics) principle. Note also 



that Eq. (10) can also be rewritten as 



^y- i#Sr^ (12) 

making comparison with the respective BP Eqs. ^ transparent. 

Obviously, Eqs. (10) and Eqs. (12) describe the (only) minimum of Fmf under the 
condition that (3 is doubly stochastic. An efficient iterative algorithm for solving the MF 
equations is discussed in Appendix |C.2| 

Direct examination shows that (unlike in the BP case) f3 with a single element equal to 
unity or zero (when the respective p element is nonzero) cannot be a solution of the MF 



Eqs. (12) over doubly stochastic (3. This means, in particular that 



Proposition 10 (MF solution is always in the interior) Fmf(P\p) is strictly convex 
and its minimum is achieved at f3 E Bp nt ^ . 

Then, Z _mf(p), defined as —log of the minimum of the MF FE Q, is simply equal 
to Zmf(p), defined as — log Fmf(/3) evaluated at the (only) doubly stochastic solution of 
Eq. @. 

Note also that the MF functional Q can not be considered as a variational proxy for the 
exact FE functional, bounding its from below. This is because the substitution on the right 



hand side of Eq. (11) does not respect the perfect matching constraints, assumed reinforced 



on the left hand side of Eq. (11). In particular the probability distribution function on the 
right hand side of Eq. (11) allows two edges of the graph adjusted to the same vertex to be 
in the active, — 1, state simultaneously, while this state is obviously prohibited by the 



original probability distribution, on the left hand side of Eq. (11) defined only over n\ of 
states corresponding to the perfect matchings. As shown below in Section |4~2l the fact that 
the MF ignores the perfect matching constraints results in the estimation Zmf(p) upper 
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bounding permfjp), contrary to what standard MF (not violating any original constraints) 
would do. 

Finally and most importantly (for the MF discussion of this manuscript), the MF ap- 
proximation for the permanent, Zmf, can be related to the permanent itself as follows: 



Theorem 11 (Permanent and MF) 

Zip) = perm(p) = Z MF (p)verm (/3./(l - /?)) J] (1 - /%), 

(ij)ee 

where (3 is the only interior minimum of 
The proof of this statement is given in Appendix |C.l 



(13) 



3.3 Fractional Approach 

The similarity between the exact BP expression Q and the exact MF expression (13) 
suggests that the two formulas are the limiting instances of a more general relation. Indeed, 
one finds that 

Theorem 12 (Permanent and Mixed Approach) For any non-negative p and doubly 
stochastic f3 which solves 



V(i, j) : 



ft- 



Pij 



(14) 



^(int) 



for 7 E [— 1; 1], and if the solution found is in the interior of the domain, i.e. (3 E Bp , 
the following relation holds 



where 



perm(p) = z| 7 ^(/3|p)perm 



0. 



(1-0). 



■ ■■V 



(15) 



(hi) 



F™(P\p) = - log(Z} 7) (/3|p)) = iPa log(Ai/fti) + 7(1 " Pij) log(l - Pij)) (16) 

(hi) 



The proof of Eq. (16) is given in Appendix D.l, An iterative heuristic algorithm solving 



Eqs. (14) efficiently is described in Appendix D.2 



Fo ll owing the general G M logic and terminology introduced in Wieger inck and Heskes 
(2003); Yedidia et aL| (2005), we call F^\(3\p) the fractional FE. Obviously the two extremes 
of 7 = — 1 and 7 = 1 correspond to BP and MF limits respectively. Many features of the 
BP and MF approaches extend naturally to the fractional case. In particular, one arrives 
at the following statement. 



Proposition 13 (Fractional Convexity, Theorem 60 of Vontobel ( 2011[ )) The frac- 
tional functional defined in Eq. (16), F^\/3\p), is a convex functional, convex over (3 E B p 
for any 7 E [— 1; 1] and any non-negative p. 
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Obviously, this statement generalizes Proposition [7| Also, the following statement becomes 



a direct consequence of Proposition 13 



Corollary 14 (Uniqueness of the interior fractional minimum) // the minimum of 
Fp\p\p) is realized at P e B^ nt) , it is unique. 

3.4 Minimal Fractional Solution 



It is easy to verify that the fractional equations (14) and the fractional FE (16) show the 
same features as their BP and MF counterparts in the 7 < and 7 > cases respec- 
tively. We will call the two regimes BP-like fractional regime and MF-like fractional regime 
respectively. In particular, one finds that 

Remark 15 (Fractional Minima) The minimal fractional solution defined by 

- log(Z^ f (p)) = F^ f (p) = min if (17) 

is achieved at (3 E Bp nt ^ in the MF-like 7 > regime, while in the BP-like 7 < regime, 



the minimum in Eq. ply can be achieved at the boundary of B p . 

It is also of interest to analyze how BP solutions change when p is changed. 
Remark 16 (BP-Fractional & Perfect Matchings) In the BP-like < 0) regime any 



perfect matching is also a solution of Eqs. (14) for a doubly stochastic (3, and any interior 



solution, f3 E Bp Ut \ may merge into the boundary, or emerge from the boundary, under 
a change of p only at one of the corners of the feasible domain corresponding to a perfect 
matching. In other words, f3 crossing an edge of the B v somewhere in the middle (and not 
merging into a vertex) under change in p is not an option. 

Moreover, combining all of the above, one arrives at the following statement valid in the 
whole range of allowed 7 E [— 1; 1]: 

Proposition 17 (7-monotonicity and continuity) For any non-negative p, Z^_f{p) is 
a monotonically increasing function of 7 E [—1; 1]. If additionally, at some 70 E [— 1; 1] the 
solution of Eq. (14) over doubly stochastic (3 is achieved at (3 E Bp int \ then the increase of 



%2-fip) w ith 7 £ [70; 1] is monotonic. 

Proof Observe that for any non-negative p and doubly stochastic P,Yl(i j)es(^~fiij) l°g(l — 
Py) < 0, so for any 71,2 E [-1; 1] such that 71 > 72, F^\p\p) < F^ 2 \/3\p). Then ac- 
cording to the definition of F^ f (p), F^)(p) < F} 7l) (/3|p) < F^ 2 \/3\p), for any doubly 
stochastic /3, in particular for ft which is optimal for 72. Finally, F^J(p) < F^J(p), 
proving monotonicity. The continuity of Z^}^(p) with respect to 7 is a consequence of the 

Ff 7 \p\p) continuity with respect to both 7 E [— 1; 1] and /3 E Bp Ut \ (The intuition with 
respect to the continuity is as follows: an increase in 7 pushes the optimal /3 away from the 
boundary of the B p polytope.) ■ 
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4. Permanent Inequalities, Special Value of 7 and Conjectures 



We start this section discussing in Subsection |4.1| the recently derived permanent inequal- 
ities related to BP and MF analysis. Then, we switch to describing new results of this 
manuscript in Subsection 4.2, which are mainly related to the fractional generalizations of 
the inequalities discussed in Subsection |4.1| We also discuss in Subsection |4.2| the special 
(and p-dependent) value of the fractional coefficient 7 for which permfjp) is equal to Zj_^(p). 
Finally, Subsection |4.3| is devoted to discussing conjectures which resolutions should help 
to tighten bounds for the permanent. 



4.1 Recently Derived Inequalities 

In this subsection we discuss a number of upper and lower bounds on permanents of pos- 
itive matrices introduced recently. Our task is two- fold. First, we wish to relate the 
bounds/inequalities to the BP and MF approaches introduced and discussed in the pre- 
ceding section. Some of these relations and interpretations are new. However, we also aim 
to test these bounds, and specifically to characterize the tightness of the bounds by testing 
the gap as a function of advection and diffusion parameters in the 2d diffusion+advection 
model in Section [5j 

The first bound of interest is 



Proposition 18 (BP lower bound) For any non-negative p 



perm(p) > Z _ BP (p). 



(18) 



This statement, as an experimental but unproven observation, was made in Chert kov et al. 



(2008). It was stated as a theorem (Theorem # 14) in Vontobel (2010), but the proof was 



not provided. (See also discu ssion in |Vontobel| ( [MIT] ) following Theorem 49/Corollary 50.) 
The statement was proven in Gurvitsj ( |2011 ). Interpreted in terms of the terminology and 
logic introduced in the preceding Sections, the proof of Gurvits ( 2011| ) consists (roughly) in 
combining the inequality by Schrijver ( Schrijver] |1998 ) 



perm(/3. * (1 - 0)) > - 



(19) 



stated for any doubly stochastic /3, with some (gauge) manipulations/transformations of 
the type discussed above in Sections 3.1.1 We will give our version of the proof (similar to 
the one in Gurvits (2011) in spirit, but somewhat different in details) in Appendix |E| One 
direct corollary of the bound (18) discussed in Gurvits (2011), is that 

Corollary 19 For an arbitrary doubly stochastic <j) 



perm(/3) > Z . BP (<f>) >]J(l- frj) 1 '^ ■ 
Next, the following two low bounds follow from analysis of Eq. 



(20) 
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Proposition 20 (BP+ lower bound #1) For any non-negative p and doubly stochastic 
f3 G Bp Ut ^ solving Eqs. (Q) (if the solution exists) results in 



perm(p) > Z BP (/3\p) J](l - — . 



This is the statement of Corollary 7 of |Watanabe and Chertkov (2010) valid for any interior 



point solution of the BP-equations, and it follows from the Gurvits-van der Waerden theorem 



( Gurvits , 


2008; 


Laurent and Schrijver, 


Chertkov 


(2010 





Proposition 21 (BP+ lower bound #2) For any non-negative p and (3 G Bp nt ^ solving 
Eqs. |6p (if the solution exists) results in 

Z > 2Z BP (P\p)([[(l - Pij))- 1 II/W 1 - An«), 



where II is an arbitrary permutation. 



This statement was made in Theorem 8 in Watanabe and Chertkov ( |2010 ) and it is also 
related to an earlier observation of Engel and Schneider (1973). (Proof of the theorem in 



Watanabe and Chertkov (2010) contained a misprint that was later corrected.) 



Proposition 22 (BP+ upper bound) For any non-negative p and (3 G Bp nt ^ solving 
Eqs. |6p (if the solution exists) 



perm(p) < Z BP (f3\p)( ]J (1 - A,))" 1 J^C 1 " E^- 



(21) 



This statement was made in Proposition 9 of Watanabe and Chertkov (2010). 



4.2 New Bounds and Special 7 

Of the bounds discussed above, three are related to BP and one to MF, while as argued 



in Section |4.3| the fractional approach interpolates between BP and MF. This motivates 
exploring below new fractional generalizations of the previously known (and discussed in 
the preceding Subsection) BP and MF bounds. 

We first derive new lower bound generalizing Proposition [20] to the fractional case. 



Proposition 23 The following is true for any doubly stochastic (3 and any 7 G [—1; 1] 



perm(/3.*(l-/3).^)> — f](l-A 



Proof This bound generalizes Corollary 7 of Watanabe and Chertkov (2010), and it follows 



directly from the Gurvits-van der Waerden theorem Gurvits (2008); Laurent and Schrijver 



( 2010| ) (see also Proposition 8 of Watanabe and Chertkov (2010), where a misprint should be 



14 



Computing the Permanent with BP 



corrected n n /n\ —> n!/n n ), and the inequality, Y2j 1 Xj > Ylj (0- ~ Pij) 7 x ^f 



"13 



Then, combining Proposition 23 with Theorem 12, one arrives at the following statement, 



generalizing Proposition 20 



Corollary 24 (fractional+ low bound) For any non-negative p and (3 E Bp solving 
Eqs. (Tfy (if the solution exists), the following lower bound holds 

perm(p) > Zp\(3\p)^ R (1 " ft;)* 1 "^. (22) 

Next one arrives at the following fractional generalization of Proposition [22) 
Corollary 25 (fractional+ upper bound) For any non-negative p and (3 E Bp nt ^ solv- 



ing Eqs. p^p (if the solution exists), the following upper bound holds 
perm(p)<Z«(/%)( J] (l-^nE^ 1 ^ 



(hj)es 



This upper bound follows from combining Theorem [12| with the simple (and standard) 
upper bound, perm(A) < rij(Si^,j) applied to A = /3. * (1 — /3).~ 7 . 

Note that Corollary [25j applied to the 7 = case and reinforced by the observation, that 
for 7 > the minimum of the fractional functional (16) is achieved in (3 E Bp Ut \ translates 
into 

perm(p) < Z^T f °\p)- (23) 



Combined with Proposition [17| Eq. ( [23] ) results in the following: 
Corollary 26 (7 > upper bound) For any non-negative p 

V7 > : pevm(p) < Z^}^{p). 

This completes the list of inequalities we were able to derive generalizing the BP and MF 
inequalities stated in the preceding Subsection for the fractional case. These generalizations 
are valid for any 7 E [0; 1]. Therefore, one may hope to derive somewhat stronger statements 
reinforcing the continuous family of inequalities with the mononotonicity of the fractional 
approach stated in Proposition [17| 

Indeed, combining Eqs. Jl8] ) with Propositions 17|26 



one arrives at 



Proposition 27 (Special 7*) For any non-negative p there exists a special 7* E [— 1;0], 
such that perm(p) = Z^}^(p) 7 and the minimal fractional solution upper (lower) bounds the 
permanent at > 7 > 7* (— 1 < 7 < 7*^. 



Proposition 27 motivates our experimental analysis of the 7*(p) dependence discussed in 
Section [U 



Note also that due to the monotonicity stated in Proposition [T7j the 7 = upper bound 
on the permanent is tighter than the MF, 7 = 1, upper bound. However, and as discussed 
in more details in the next Subsection, even the 7 = upper bound on the permanent is 
not expected to be tight. 
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4.3 Conjectures 



It was conjectured in Vontobel| (2010) that 



permO) < Z _ BP (p) * /(ra), 



(24) 



and also that / ~ y^. The second part of the conjecture was disproved by Gurvits (2011) 



with an explicit counter-example. The inequality in Eq. (24) turns into the equality f(n) = 
\/2 n when p is doubly stochastic and block diagonal, with all the elements in the 2x2 
blocks equal to 1/2^] Then it was conjectured in Gurvits (2011) that 



Co njec ture 28 (BP upper bound Gurvits ( 2011[ )) For any non-negative p, f(n) in 
Eq. is ~ V2 n . 



Another related (but not identical) conjecture of Gurvits (2011) is as follows: 
Conjecture 29 For any doubly stochastic n x n matrix <j) 



perm(0) < JJ(1-- 



(25) 



Note that if Eq. (25) is true it implies according to Linial et al. (1998) a deterministic 
polynomial-time algorithm to approximate the permanent of n x n nonnegative matrices 
within the relative factor y/2 . 

It can be verified directly that the special matrix (with "doubly degenerate" blocks) for 



which the condition (25) is achieved (i.e. inequality is turned into equality) also results 



in with 7 = —1/2 on the right hand side of Eq. (|25|). Therefore one reformulates 



Conjecture 28 



as 



Conjecture 30 For any non-negative p 



perm(p) < Z^_ 



-1/2 



(P)- 



We refer an interested reader to |Vontobel ( |2011 ) for discussion of of some other conjectures 
related to permanents. 



5. Experiments 

We experiment with deterministic and random (drawn from an ensemble) non-negative 
matrices. 

Our simple deterministic example is of the matrices with elements taking two different 



values such that all the diagonal and all the off-diagonal elements are the same Watanabe 



and Chertkov (2010). 



In our experiments with stochastic matrices we consider the following four different 
ensembles 



5. Note that this special form of the 2x2 block corresponds to the "double degeneracy" discussed in the 
paragraph preceding Corollary [8] 
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(Ai n , A out ). Ensemble of matrices motivated by Chertkov et al. (2008); Chertkov et al. 



(2010) and corresponding to a mapping between two consecutive images in 2d flows 
parameterized by the vector A = (a, 6, c, «), where k is the diffusion coefficient and 
(a, 6, c) stand for the three parameters of the velocity gradient tensor (stretching, 



shear and rotation respectively - see |Chertkov et al.| ( |2010| ) for details). In generating 
such a matrix p we need to construct two sets of A parameters. The first one, Ai n , 
is used to generate an instance of particle positions in the second image, assuming 
that particles are distributed uniformly at random in the first image. The second one, 
A ut 5 corresponds to an instance of the guessed values of the parameters in the learning 
problem, where computation of the permanent is an auxiliary step. (Actual optimal 
learning consists in computing the maximum of the permanent over A out .) In our 
simulations we test the quality of the permanent approximations in the special case, 
when Ai n = A out , and also in other cases when the guessed values of the parameters 
do not coincide with the input ones, Ai n ^ A ou t- 

[0; p]-uniform. In this case one generates elements of the matrix independently at 
random and distributed uniformly within the [0; p]-range. 

6 -exponential In this case one generates elements of the matrix independently at ran- 
dom. Any element is an exponentially distributed random variable with mean 5. 

( 1/2 1/2 \ 

[0; p]- shifted We generate the block diagonal matrix with I , , I blocks and 



1/2 1/2 

add independent random and uniformly distributed in the [0; p) interval components 
to all elements of the matrix. The choice of this ensemble is motivated by the special 
role played by the (doubly degenerate) block-diagonal matrix in the Gurvits conjecture 
discussed in Section l4~3l 



To make the task of the exact computation of the permanent of a random matrix 
tractable we consider sparsified versions of the ensembles defined above. To achieve this 
goal we either prune full matrix from the bare (i.e. not yet pruned) ensemble, according to 
the procedure explained in AppendixjFj or in the case of the [0; p]-uniform ensemble we first 
generate a sufficiently sparse sub-graph of the fully connected bipartite graph (for exam- 
ple picking a random subgraph of fixed O(l) degree) and then generate nonzero elements 
corresponding only to edges of the sub-graph. 

5.1 Deterministic Example 

We consider a simple example which was already discussed in |Watanabe and Chertkov 



(2010). Permanent of the matrix p with elements 

where w > 1 and T > 0, can be evaluated through the recursion, 

n ~ k),T {t) Dk > 



k=0 
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Figure 2: Illustration for the case of the deterministic matrix (26). Fig (a) shows the gap 
between the exact permanent and its lower bound estimate by Eq. ( |24| ). Fig. (b) 
shows dependence of the special 7* on the parameters. 



Dq = 1, D\ = 0, and Vfc > 2, — (k — 1){D^_\ + D&-2)- On the other hand, seeking for 
solution of the fractional Eqs. (fl4|) in the form of a doubly stochastic /3, where 



one finds that e should satisfy the following transcendental equation, 

(1 - e(n - 1))(1 - e) 7 = w l ' T (n - 1)V +7 . 

At T — >> oc this equation has a unique uniform, e —± l/(n — 1), solution. An interior, e > 0, 
solution of Eq. ( |27| ) exists, and it is also unique, at any finite T for non- negative 7 and at 
T > log wj log (n — 1) for negative 7. 

To test the gap between the exact expression for the permanent and the fractional lower 



bound of Corollary [24J we fix w = 2, n = 20 and vary the temperature parameter, T. The 
results are shown in Fig.[2^i. One finds that the gap depends on 7 with 7 = resulting in the 
best lower bound for all the tested temperatures. One also observes that the 7-dependence 
of the gap decreases with increase in T. Fig. [2)3 shows dependence of the special 7*, defined 
in Proposition [27l on n and T at w = 2. One finds that, consistently with the Conjecture 



30, 7* is always smaller than —1/2 and it also decreases with increase in either n or T. 



5.2 Random Matrices. Special 7*. 

We search for the special 7 = 7*, defined in Proposition [27| by calculating the permanent 
of a full matrix, of size n x n, with n = 3, . . . , 14, and of a pruned matrix of side-length in 
the 10 to 40 range, and then comparing it with the fractional value Z^\f3,p) j^, where the 

6. In the following we will use the shorter notation, zi 1 ^ for this object. 
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doubly stochastic (3 solves Eqs. (14) for given for different 7. By repeatedly evaluating 



the fractional approximation for different values of 7 and then taking advantage of the 
monotonicity and continuity with respect to 7 and performing a search we find the special 
7 for a specific p. 

In general we observed that the special 7* for tested matrices was always less than or 



equal to —1/2, which is consistent with Conjecture 30 We also observed, estimating or 
extrapolating the approximate value of the special 7* for a given matrix, that it might be 
possible to estimate the permanent of a matrix efficiently and very accurately for some 
ensembles. 

5.2.1 The (A in ,A ou t) ensemble 

In this subsection we describe experiments with several of the (Ai n , A ou t) ensembles. We are 



interested in studying the dependence of the special 7*, defined in Proposition [27j on the 
matrix size and other parameters of the ensemble. We consider here a variety of cases. 

Fig. [3] shows the results of experiments with full but (relatively) small matrices and 
different values Ai n ,A out . The results are presented in the form of a scatter plot, showing 
results for different matrix instances from the same ensemble. 

As can be seen from the grouping of the first five plots in Fig. [3j the dependence of the 
special 7* on the matrix size at Ai n = A ou t is largely sensitive to the diffusion parameter 
k and it is not so dependent on the advection parameters a, 6, c. Indeed, Figs. [3] (a,b) are 
similar to each other, as are Figs. [3^c-e), despite having different values of a, 6, c. 

Figs.|3ja-e), along with Figs.|3jf,g), also demonstrate an interesting feature: the lower as, 
the more erratic the behavior of the special 7*, with Figs.[3jf,g) demonstrating this tendency 
at its extreme. With low diffusion, matrices were dominated by the largest permutation 
and search for the special 7* became less meaningful, with seemingly random behavior. 

Analyzing the three last cases in Fig. [3] with Ai n 7^ A out , we observed that the larger the 
value of Kouti the more regular the resulting behavior. 

Fig. [4] shows the same scatter plots as in Fig. |3| observed for larger but sparser (90% 
pruned) matrices. We observed the general tendency for the average special 7* to decrease 
with increasing n; however, it is not clear from the observations if the resulting level of 
fluctuations decreases with the increase in n or remains the same. 

Summarizing, for the (Ai n , A ou t) ensemble, we found that the behavior of the special 7* 
with respect to matrix size to be largely dependent on K out , the diffusion coefficient used to 
generate the matrix, while the dependance of other factors is significantly less pronounced. 
The average special 7 decreases with increasing n, while respective variance remains roughly 
the same. 

5.2.2 Uniform and 5-exponential ensembles 

Fig. [5] shows scatter plots for examples of the (a) [0; l]-uniform ensemble, and (b) 5- 
exponential ensemble. Here we found a very impressive decrease in variance with increase in 
the matrix size. Besides, we observe that in spite of their difference, the two ensembles show 
qualitatively similar behavior of 7* as a function of n. This indicates that for large matri- 
ces, whose entries are independent random variables, we could achieve excellent accuracy 
by extrapolating on the special 7* and estimating (7*) = perm(p). Indeed, the rapidly 



19 



Chertkov and Yedidia 




12 16 20 




8 12 16 20 



(a) A in = Aout = (1, 1, 1, l)/2 (b) A in = Aout = (2, 2, 2, 1/2) 




4 

fc) A 



in — Aout 



12 16 20 

= (0,0,0,1) 



4 

(d) A in 



8 12 16 20 

A ou t = (1,1,1,1) 




4 8 12 16 20 

(e) A in = Aout = (2, 2, 2,1) 



-0.5 
-0.6 
-0.7 
-0.8 



I t : 1 ■ 1 • 

" ! : 1 1 ! 



-0.5 
-0.6 
-0.7 
-0.8 



: ' ' i 

• ! • ! 



II 



-0.5> 
-0.6 j 
-0.7 
-0.8 



4 8 12 16 20 4 8 12 16 20 4 8 12 16 20 

(f) Ain = Aout = (1, 1, 1, l)/4 (g) Ain - Aout = (0, 0, 0, 1/10) (h) Ain = Aout = (0, 0, 0, 2) 



-0.5> , 
-0.6 j 



i>. 



-0.7 .,| 



-0.8 



■lliiii, 
'MM 



(i) A in = 
(0,0,0,1) 



8 12 16 20 

(1,1, 1,1), Aout = 




(j) A in = 
(0,0,0,1) 



8 12 16 20 4 

(0,0, 0,2), A out = (k) A in = 
(0,0,0,2) 



12 16 20 

(0,0, 0,1), Aout = 



;ure 3: Scatter plot of the special 7* calculated for instances from the (Ain, A ou t) ensemble 
and varying the matrix size within the 2 4- 14 range (no pruning). 
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ure 4: Scatter plot of the estimated special 7* calculated for instances from the (Ai n , A ou t) 
ensemble and varying the matrix size within the 15-1-40 range (with 90% pruning). 



-0.5 
-0.6 I 
-0.7 " 1 

■ 

-0.8 

4 8 12 16 20 4 8 12 16 20 

(a) uniform ensemble. Full (b) ^-exponential ensemble with 
(small) matrix. 5=1. Full (small) matrix. 

ure 5: Scatter plot of the estimated special 7 calculated for instances from a random 
matrix ensemble and varying the matrix size. 



-0.5' ^ 

-0.6 j . 
1 

-0.7 1 
-0.8 



21 



Chertkov and Yedidia 




12 16 20 



(a) p = 1/20-shifted 



4 8 12 16 20 



(b) p= 1/200-shifted 



Figure 6: Scatter plot of the estimated special 7* calculated for instances from two examples 
of the [0; p]-shifted ensembles and varying the matrix size. 



decreasing variance suggests that most of the error would come from the extrapolation of 
the average special 7* for a given matrix size and not the error of ^/(7average) — ^/(Tactual)? 
since 7 av erage and 7 ac tuai will be very close in value. 

5.2.3 The [0; p]-shifted ensemble 

We also studied the special 7* in "badly-behaved" cases such as the one brought up earlier, 



with 2x2 squares of 1/2's positioned along the diagonal. (See discussion in Section 4.3.) It 
can be easily shown that the special value of 7* of the bare block-diagonal matrix is —1/2. 
Unsurprisingly, our experiments, documented in Fig. [6j showed that: (a) the resulting 7 is 
always smaller than —1/2, and (b) as more noise was introduced, the special 7* decreased 
in value faster with respect to matrix size. However, this decrease with n towards smaller 
7* was much slower than in other ensembles, particularly for low noise. 

5.3 Random Matrices: Testing Inequalities and Conjectures 



Figs. |7|8| and Figs. 9|10[ showing average behavior and scatter plots for smaller and larger 



(pruned) matrices respectively (see figure captions for explanations), present experimental 
verification to the variety of inequalities discussed in Section |4.1| The ensemble used for 
these plots was the ensemble Ai n = A ou t = (1, 1, 1, l)/2. The data suggests that neither 
of the bounds are actually tight, and moreover the values of the gaps, between the exact 
expression and respective estimates tested, fluctuate more strongly with increasing matrix 
size. We also observe from Figs. [7] and Figs. [8]f, that Eq. (24) has f(n) growing faster with 



n than ~ y/n even on average. In the case of larger pruned matrices we removed the two 
expressions \og((Z BP (U {i>j)e£ (l - Aj)) -1 IljC 1 " E,(/%) 2 ))/^) and log((2Z B p(n < j(l " 
Ili /^^(l — (if'^))/Z). We removed the former because in the case of pruning the 
resulting (3 is often partially-resolved (with some elements of /3 equal to one) and in this case 
the inequality does not carry any restriction. We removed the latter because, in the pruned 
case and for a randomly chosen permutation, it is very likely that at least one element of (3 
is zero, making the bound discussed unrestrictive. 



Fig. 11 shows that the bounds given by the Corollaries 24, E5^do not depend much on 



7 and that they in practice depend more on matrix size and on peculiarities of individual 
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Number of particles 



14 



Figure 7: This figure describes the gap between the actual value of the permanent and 
the values of the various theoretical upper and lower bounds described in the 
paper and averaged over different simulation trials. Each color corresponds to 
an upper or lower bound as follows: Blue corresponds to the MF approxima- 
tion to the permanent, Z^f, as described in Proposition ??. Green corre- 
sponds to log(Z BP (Yl {ij)eS (l - fiij^Uji 1 - J2i(Pij) 2 )/ z )- R ed corresponds 
to log(Z BP Y\ij(l ~ faj^-^/Z). White corresponds to log(Z BP /Z). Yellow 
corresponds to log(2ZBp(Yl i j(l — Aj)) -1 1L/^(1 ~~ PD/Z)- Purple corresponds 
to log(0.01ZBpy/n/Z). Cyan corresponds to log(Zj~° /Z). Black corresponds to 

log(Zj --0 ' 5 /Z). To make a data point, 100 instances, each corresponding to a 
new matrix are drawn, and the log of the ratio of the bound to the actual per- 
manent is recorded. The data shown corresponds to matrices from the (Ai n , A ou t) 
ensemble. 
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Figure 8: Scatter plots for the data shown in Figs. Q. For better presentation the data 
is split into 8 sub-figures. The vertical axis of each scatter plot is specific to the 
behavior of expression with respect to the matrix size and the color coding for 
different objects tested coincides with that used in Fig. [7| 
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Figure 9: The data is shown like in Fig. [7j but for large, sparsifled matrices. Less mean- 
ingful expressions were removed from the plot. Each color corresponds to a 
mathematical expression, as follows, white: log(Z#p/Z), red: \og(Zsp Y\i j(l — 
Bf^Bi - 1) • n\/{n n Z)\ blue: \og{Z MF /Z), purple: log(Z B pVn/(100Z)), cyan: 
log(Z/(7 = 0)/Z), black: log(Z/(7 = -l/2)/Z); where Z = perm(p). 
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ure 10: Scatter plots for the data shown in Figs. For better presentation the data is 
split into 6 sub-figures. Each scatter plot is specific to the behavior of expression 
with respect to the matrix size and the color coding for different objects tested 
coincides with the one used in Fig. [9) 




o.o 
Gamma 



ure 11: This plot shows 7-dependence of the gap between upper and lower bounds cor- 



responding to Corollaries [24J [25) Here, the bounds were plotted for six different 
matrices, each generated with Ai n = A out = (1, 1, 1, l)/2, where the upper and 
lower bounds are color-coded to indicate that they correspond to the same ma- 
trix. 
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Figure 12: A plot of the average (Subflgure a) and standard deviation (Subflgure b) of 

perm(/3)/ Yli j(l — ( m blue) and perm((3) /Z -bp(P) (in red), where (3 

is a doubly stochastic matrix picked from 100 instances of the random ensemble 
described in the text, shown as a function of n. 



matrices. There is a slight change for values of 7 near —1, but otherwise the plot is nearly 
flat, so it seems that unfortunately little tightening of the bounds can be achieved by 
tweaking 7. Another noteworthy observation is that a higher upper bound implies a higher 
lower bound, and vice-versa. 



Fig. [12] is related to discussions of Corollary [19] for the permanent of a doubly stochastic 
matrix. We generate an instance of a doubly stochastic matrix and calculate the respective 
BP expression in three steps (this is the procedure of Knopp and Sinkhorn| ( |1967[ ), also 
discussed by Huang and Jebara| ( |2009 )) : (a) generate a non-negative matrix from the [0; 1] 
ensemble; (b) re-scale rows and columns of the matrix iteratively to get a respective doubly 
stochastic matrix Q and (c) apply the BP- (7 = —1) procedure to evaluate the Z q _bp 



estimate for the resulting doubly stochastic matrix. In agreement with Eq. (20), the average 
value of the log corresponding to the BP-lower bound is positive and smaller than the 
respective expression for the average of the log of the explicit expression on the right hand 
side of Eq. ( |20| ). (The hierarchical relation obviously holds as well for any individual instance 
of the doubly stochastic (3 from the generated ensemble.) We also observe that the average 
values of the curves show a tendency to saturate, while the standard deviation decreases 
dramatically, suggesting that for large n this random ensemble may be well approximated 
by either BP or, even more simply, by its explicit lower bound from the right hand side of 



Eq. (20), the latter being in the agreement with the proposal of Gurvits (2011). 



6. Conclusions and Path Forward 



The main message of this and other related recent papers by Chertkov et al. 


(2008 


) ; |Huang 


and Jebara ( 


2009 


); 


Chertkov et al. (2010 


); 


Watanabe and Chertkov (2010); 


Vontobel 


(2010, 



2011); Gurvits (2011) is that the BP approach and improvements not only give good heuris- 



tics for computing permanents of non-negative matrices, but also provide theoretical guaran- 



7. The rescaling is a key element of 

version of the 7 = iterative algorithm. 



Linial et al. 



(1998), and we can also think of the procedure as of a 
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tees and thus reliable deterministic approximation. The main highlights of this manuscript 
are 

• The construction of the fractional approach, parameterized by 7 E [— 1; 1] and inter- 
polating between BP (7 = —1) and MF (7 = 1) limits. 



• The discovery of exact relation between the permanent of a non-negative matrix, 
perm(p) and resp€ 
tionally tractable. 



perm(p) and respective fractional expression, Z^\p), where the latter is computa- 



• The proof of the continuity and monotonicity of Z < j i \p) with 7, also suggesting that 
for some 7* E [— 1;0], perm(p) = Z^*\p). 

• The extension of the list of known BP-based upper and lower bounds for the permanent 
by their fractional counterparts. 

• The experimental analysis of permanent s of different ensembles of interest, including 
those expressing relations between consecutive images of stochastic flows visualized 
with particles. 

• Our experimental tests include analysis of the gaps between exact expression for the 
permanent, evaluated within the ZDDs technique adapted to permanents, and the 
aforementioned BP- and fractional-based lower/upper bounds. 

• The experimental analysis of variations in the special 7 for different ensembles of 
matrices suggests the following conclusions. First, the behavior of the special 7 varies 
for different ensembles, but the general trend remains the same: as long as there is 
some element of randomness in the ensemble, the special 7 decreases as matrix size 
increases. Second, for each ensemble the behavior of the special 7 is highly distinctive. 
For some considered random matrix ensembles the variance decreases quickly with 
increasing matrix size. All of the above suggest that the fractional approach offers a 
lot of potential for estimating matrix permanents. 

We view these results as creating a foundation for further analysis of theoretical and 
computational problems associated with permanents of large matrices. Of the multitude of 
possible future problems, we consider the following ones listed below as the most interesting 
and important: 



Improving BP and fractional approaches and making the resulting lower and upper 
bounds tighter. 

Further analysis of the 7-dependence, making theoretical statements for statistics of 
log-permanent s at large n and for different random ensembles. 

Utilizing the new permanental estimations and bounds for learning flows in the setting 



of Chertkov et al. (2010). Combining within the newly introduced fractional approach 
the /3-optimization with optimization over flow parameters (by analogy with what is 
done in Chertkov et al.| Q2010| )). Applying the improved technique to various Particle 



Image Velocimetry (PIV) experiments of interest in fluid mechanics in general, and 
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specifically to describe spatially smooth multi-pole flows in micro-fluidics, see e.g. 



discussion of the most recent relevant experiments in Drescher et al. (2010); Guasto 
et al. (2010) and references therein^] 



Addressing other GM problems of the permanental type, e.g. counting matchings 



(and not only perfect matchings) on arbitrary graphs (drawing inspiration fromjSang- 
|havi et aL (2011) generalizing |Bayati et al." ( 2008| ) in the ML setting) and higher- 
dimensional matchings, in particular corresponding to matching of paths between 
multiple consecutive images within the "learning the flow" setting. 
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Appendix A. Most Probable Perfect Matching 

This short appendix is introduced to guide the reader through material which is related, 
but only indirectly (through physics motivation and historical links), to the main subject 
of the manuscript. 

A.l Most Probable Perfect Matching over Bi- Partite Graphs 

According to Eq. Q, the permanent can be interpreted as the partition function of a GM. 
The partition function represents a weighted counting of the n\ of perfect matchings. Using 
"physics terminology" one says that this perfect matching representation allows to interpret 
the permanent as the statistical mechanics of perfect matchings (called dimers in the physics 
literature) over the bi-partite graph. This is statistical mechanics at finite temperatures, as 
the partition function represents a (statistical) sum over the perfect matchings. 

However, it is still of interest to discuss (at least in the context of establishing historical 
links) the "zero temperature" , or Maximum Likelihood (ML) version of Eq. (§ 

- \og(Z ML (p)) = mm ^ <Tijl°g0-/Pij)- ( 28 ) 

(ij)ee 



8. We are thankful to Eric Lauga for suggesting to us the micro-fluidics experiments as one possible appli- 
cation for the "learning the flow" BP-based approach. 
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According to the logic of Yedidia et al. (2005), Eq. (28) can also be stated in the probability 



space (i.e. in terms of b(cr)) as 



\og(Z M L(p)) = min ^ ^ a {j log(l/^)6(cr) 



(29) 



Then, the functional of b(a) which is the object of minimization over beliefs in Eq. (29) is 
naturally the ML (zero temperature) version of the FE Functional p. 



By construction, Zml{p) < Z(p) f° r any p. Note also that Eq. (28) is a Linear Program- 
ming (LP) equation, but one which at first sight appears intractable, giving an optimization 
defined over a huge polytope and spanning all the perfect matchings with nonzero proba- 
bility. For a general GM the LP-ML formulation is indeed intractable, but for the specific 
problem under consideration (finding the perfect matching over a bipartite graph) the ML- 
LP problem ( |29| ) becomes tractable, as discussed below in the next Subsection. Given 
classical results in the optimization theory, related to the so-called Hungarian algorithm, 
by Kuhn| ( |1955[ ), and the auction algorithm, by Bertsekas (1992), this special solvability 
(reduced complexity) of the ML perfect matching problem is not surprising. 



A. 2 Linear Programming Relaxation of BP 

The Bethe FE ^ can be split naturally into the self-energy term and the self-entropy terms 
(at unit temperature), Fpp — Epp — Sep'- 



E BP (f3\p) = -Y / f3 ij \og(p ij ), 

Sbp(P\p) = lo g(^i) + (* - Pa) lo ^ 1 ~ 

(iJ) 



(30) 



If the entropy term is ignored in Eq. ([8]) the problem turns into the Linear Programming 
(LP) formulation of BP 



\og(Z LP (p)) 



min E B p(/3). 



(31) 



One can also arrive at the same LP formulation (31 ) relaxing the original ML-LP setting 



(29). As shown in Bayati et al.| ( |2008 ); Chertkov (2008), the relaxation is provably tight 
for any p , i.e. Zpp{p) = Zml(p), as the resulting matrix of constraints in the LP problem 



(31) describing the doubly stochasticity of (3 is totally uni- modular, so the corners of the 



respective polytope are in one-to-one correspondence with the perfect matching configura- 



tions/corners of the higher-dimensional polytope from Eq. (29), also in accordance with the 



Birkhoff-vonNeumann theorem by |K6nig (1936); Birkhoff (1946); von Neumann 



Appendix B. Bet he- Free Energy Approach 
B.l Exact BP-based Relations for Permanents 

We present here a simple proof of Eq. Q, essentially followi ng a slightly modified version 
of what was the main statement of Watanabe and Chertkov (2010). 
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Consider an interior minimum of the Bethe FE functional Q achieved with a strictly 
nonzero (for elements with positive pij) doubly stochastic (3. Then, the minimum satisfies 
Eqs. ([6]), where log('u) are respective Lagrangian multipliers. Weighting the logarithm 
of Eqs. ^ with /3, summing up the result over all edges, using Eq. ^ and the double 
stochasticity of /3, one derives 

^2 Pij \0g(UiUj) = ^ l °% U i + XI l ° gVjJ 

(ij)es ieVi jev 2 

= (hi l°s(Pij/Pij) - Pij lo g(! " Pij)) = lQ g Zbp ~ ~ P^)- (32) 

(ij)ee (ij)es 

On the other hand, applying the permanent to both sides of Eq. ^ one arrives at 

perm(p) = perm(/3. * (1 - /?)) I ]J u { J I JJ v? \ . (33) 



Combining Eq. ([32]) with Eq. ([33]) results in Eq. Q 



B.2 Iterative Algorithm(s) for finding solution of BP equations 

First of all, let us recall that according to Proposition [7] ^ is convex. However, as explained 
above the convexity is not trivial, as it is enforced by global constraints. This lack of 
convexity of individual edge-local terms in Eq. ^ creates a technical obstacle to finding 
a valid fixed point of Fbp, suggesting that an iterative algorithm converging to the fixed 
point of Fbp will be more elaborate than the one discussed below in the MF case. 

To find a valid solution of BP in our numerical experiments we use the following practical 
iterative scheme (heuristics), previously described in Chertkov et al. ( 2008| ) (see Eqs. (7,8) 
as well as preceding and following explanations): 



V(i,j): /%(n+l) = A/%(n) + - 



(1 - X)pij 

H] yn-r,, ^W^ p .. + ( EfcAi ( n )/2 + Sfc/3 , fc (n)/2-A,(n)) 2 K(nV(n))' 

(34) 

Vi- u (n I 1) - ^kPik/u k {n) j( , _ EkPkj/Mn) , . 

VZ ' * ( + } ~l-E,(/%(n)) 2 ' J ' ( + 1,_ l-»)) 2 ' ( J 

where the arguments of the /3's indicate the order of the iterations. The damping parameter 
A (typically chosen 0.4 0.5) helps with convergence. To ensure appropriate accuracy for 



solutions with /3's close to zero or unity we also insert a normalization step after Eqs. (34) but 



prior to Eqs. (35), making the following two transformations subsequently, (a) V(i, j): —> 



Pij/^kPi' an d (b) V(z,j): flij — > Pij/YlkPk' (The two steps implement an elementary 



step of the Sinkhorn operation from |Huang and Jebara| (2009).) The algorithm is sensitive 
to initial values for (3 and u. To ensure convergence, one initiates the algorithm with 
the output of the MF scheme (which converges much better) as described in Appendix 



C.2, i.e., /3(0) = /3mf and u(0) = vmf- Numerical experiments show that this procedure 
always converges to an interior stationary point of the BFE Q (when one exists and is not 
degenerate). 
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Note, that the algorithm presented above is certainly not the only option one can use 
to find a doubly stochastic solution of BP Eqs. In fact, the standard Sum-Product 
Algorithm (SPA) of Yedidia et al. (2005), stated for the problem of computing the perma- 
nent in Chertkov et al. (2010), is a serious competitor, which according to Theorem 32 of 



Vontobel (2011) always converges to the minimum of the Bethe FE. Future work is required 



to compare the convergence speed of the two algorithms. 



Appendix C. Mean-Field (Fermi) Approach 
C.l Exact MF-based Relations for Permanents 



We present here a simple proof of Eq. (13), essentially following the logic of what was 



described above for BP in Appendix B.l 



Weighting the logarithm of Eqs. (12) with that doubly stochastic which minimizes 
Eq. ( [131 ), summing the result over all the edges, and making use of Eqs. Jl2|9l ), one derives 



^2 fa \og(viVj) = ^2 lo g^ + "^2 logyJ 

(i,j)es ieVi jev 2 

= (Aj logipij/Pij) + Pij log(l - fa)) = log Z M f(p) + log(l"/%).(36) 

(i,j)es (ij)ee 



On the other hand, applying the permanent to both sides of Eq. (12) one arrives at 



perm(p) = perm(/3./(l — /?)) I JJ 



ieVi 




(37) 



Combining Eq. (36) with Eq. (37) results in Eq. (13). 



C.2 Iterative Scheme for Solving Mean-Field Equations 



An efficient way to find a (unique) solution of the MF system of Eqs. (12) for doubly 
stochastic /3 is to initialize with 1^(0) = v^(0) = 1 and iterate according to 



Pij 



Pij(n + i) = — , 
Pij + Vi(n)v3 (n) 

Vi(n+ 1) = Vj(n) Pijjn), i; J (n + 1) = v J (n) ^ Pijjri), 



(38) 
(39) 



until the tolerance S > max(abs(/3(n + 1) — /3(n))) is met. 



Appendix D. Fractional Approach 
D.l Exact Relations for Permanents 



We present here a simple proof of Eq. (15), which is a direct generalization of what was 
discussed above in Appendices B.1|C.1| 
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Weighting the logarithm of Eqs. (14) with that doubly stochastic (3 which minimizes 



Eq. (15), summing the result over all the edges, and making use of Eqs. ( |14|T6| |, one derives 



^2 iog(wiWj) = ^2 i °s w i + ^2 logu,J 

(i,j)e£ ieVi jev 2 

= (hi log(pij/Pij) + 7&j log(l - p^)) 

= log Z<f\f3\p)+ j Yl Ml-fti)- 



(40) 



On the other hand, applying the permanent to both sides of Eq. (14) one arrives at 



perm(p) — perm(/3./(l — 




(41) 



Combining Eq. (40) with Eq. (41) results in Eq. (15). 



D.2 Iterative Scheme for Solving Fractional Equations 



All edge-local terms in the fractional functional (16) are convex in (3 E [0; 1] for 7 > 0, while 
for negative 7 the edge-term convexity holds only when all elements of (3 are smaller than a 
threshold f3 c > 1/2, which is a solution of /3 C log(/3 c ) = —7(1 — (3 C ) log(l — f3 c ). This suggests 
different iterative schemes for positive and negative 7. 

When 7 > we use the following modification of the MF scheme ( 38p9 ): 



/%(n + l) 



Pij (1 - Pijin)) 



7-1 



Pij (I ~ /% W) 7 " 1 + Wi(n)wi(n) ' 
(n + 1) = Wi(n)^2Pij(ri), w J (n + 1) = w j (n) ^ Pijjn), 



In the case of 7 < we use the following modification of the BP scheme 
V(i,j): pij(n + l) = \pij(n) 



+ 



(l-X)p ij (l + p ij (n)) 



1+7 



Pij (l + Pij{n)Y+i + (Z k P{(n)/2 + Z k #(n)/2 - ^{n))^^ (n)) ' 
EkPik(l + Pik(n)) 1+ yw k (r 



Vi : Wi(n + 1) 



Mj : w j (n + l) 



n 



i-£ ; (/%W) 5 



EfcPfcj(l + fojM) 1+7 M(") 

1 - Ei(AiH) 2 



(42) 



(43) 



where the arguments of the /3's indicate the order of the iterations. The damping parameter 
A (typically chosen 0.4 -i- 0.5) helps with convergence. To ensure appropriate accuracy for 



solutions with /3's close to zero or unity we also insert a normalization step after Eqs. (42) 
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but prior to Eqs. (43), making the following two transformations consequently, 

(a) V(z,j): 

k 

(b) V(z,j): /3y->/VE$- 

The algorithm is sensitive to initial values for (3 and iu. To ensure convergence, we initiate 
the algorithm with the output of the MF scheme (which converges much more easily) 



described in Appendix C.2, i.e. /3(0) = /3mf and w(0) = vmf- Numerical experiments 



show that this procedure converges to a stationary point of the fractional FE (16). We also 
verified that the iterative scheme designed for 7 < also converges in the 7 > case, even 
though the former scheme is obviously faster. 

Note, that fractional version of the standard Sum-Product Algorithm (SPA) can be 
developed. It is also natural to expect, in view of the general convexity of the fractional 
FE discussed in the main body of the text, that there exists a provably convergent version 
of the SPA. It will be important to design such a convergent 7-SPA in the future and to 
compare its practical performance against one of the heuristics described above. 

Appendix E. BP Gives Lower Bound on the Permanent 



Here we give our version of the proof of the lower bound (18). First of all, in the case 
when the Bethe FE reaches its minimum in the interior of the domain, i.e. at (3 G B p , 



Eq. (18) follows directly from the main result of Watanabe and Chertkov (2010), i.e. Eq. Q, 



and Schijver's inequality (19). Therefore, according to explanations of Section 3.1.1| we 
only need to analyze the case when the minimum of the Bethe FE is a partially resolved 
solution, with /3 which can be split by appropriate permutations of rows and columns of the 
matrix into a perfect matching block (corresponding to a corner of the respective projected 
polytope), the block with all elements smaller than unity and nonzero unless the respective 
element of p is zero (thus lying in the interior of the respective subspace), and all cross 
elements of f3 (between the blocks) equal to zero. Then, Z q _bp for such a partially resolved 
solution is split into the product of two contributions, Z _bp — Z pm • Zi n t, where Z pm 
corresponds to the perfect matching block, and Zi nt corresponds to the interior block. In 
fact, Z pm is equal to the weighted perfect matching block of p and — log(Zi nt ) corresponds 
to the minimum of the Bethe FE computed for the interior block of p. On the other hand 
the full partition function, Z, can be bounded from below by the product Z > Z\ • Z2, 
where Z\ and Z2 are permanents of the first and second blocks of the original matrix p. 
(Thus contributions of all the cross-terms of p into Z are ignored.) However, Z\ > Z pm , 
as counting only one perfect matching (and ignoring others), and Z2 > Zi nt in accordance 
to what was already shown above for any minimum of Bethe FE achieved in the interior of 
the respective domain □. 

Appendix F. Pruning of the Matrix 

Computing the permanent of sufficiently dense matrixes exactly with the ZDD approach 
explained in Appendix [G] is infeasible for n > 30. To overcome this difficulty we choose 
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to sparsify dense matrices generated in one of our experimental ensembles, removing their 
less significant entries in the following steps. First, we use LP, described in Appendix 
A. 2, to find the permutation correspondent to the maximum perfect matching. To avoid 



getting a zero permanent in the result, we include all components of the perfect matching 
permutation in the pruned matrix. Second, we consider every other entry of the matrix 
and keep it in the matrix only if it is included in a perfect matching which is close to the 
maximum perfect matching, i.e. the two permutations share all but a few of their entries 
and ratio of their weighted contributions (in the permanent) is larger than a pre-defined 
value. Then, we act according to either of the two strategies, both of which are explored in 
this manuscript. One strategy is to include all permutations whose products are more than 
a given fraction of the main permutation. This method will tend to reduce the fluctuations 
in the error of the pruned matrix (i.e. will reduce the variation in ^ prU ned /^original)- The 
other method is to always prune a set fraction of entries from the matrix, and prune them 
in order of decreasing value as determined by the above criterion. This method will reduce 
the fluctuations in the runtime of the algorithm. 



Appendix G. Zero-suppressed Binary Decision Diagrams (ZDDs) method 

Zero-suppressed Binary Decision Diagrams, or ZDDs, are a tool useful for representing 



combinatorial problems. The concept was introduced by Shin-Ichi Minato in 1993 jMinato 
( [19931 ). The idea of ZDDs 

is as follows: if one defines a combinatorial problem to be a 
function of many variables, each taking values in {0, 1}, with the value of the function itself 
being also in {0, 1}, then those sequences of inputs that lead to unity can be thought of 
as "solutions" to the problem. Furthermore, each solution can be described in terms of the 
input variables within it that are equal to unity. The problem, then, can be described as 
being a "family of sets", or set of sets, where the family is of all solutions to the problem 
and each set within the family is the set of input variables whose value is 1 in that solution. 



To give an example of the "family of sets" concept, consider the XOR function, which 
returns 1 if and only if the inputs are equal. This function can also be represented as the 
family of sets {0, {1, 2}}, where 1 and 2 correspond to inputs 1 and 2 to the function, because 
if the function is to have value 1 then either both inputs must be equal to 1 or neither must 
be. Once this has been understood, it is best to see the ZDDs as nothing more than a 
concise representation of this family of sets, since the family can get quite cumbersome for 
problems with many solutions and many variables. Note that this system of representing 
problems provides the greatest improvement when there are few solutions, and when the 
solutions themselves are sparse, since the family of sets is then small. Correspondingly, 
ZDDs are most efficient under these conditions. 

The actual format of a ZDDs is that of a directed tree of nodes, with each node having 
a directed edge to two other nodes. Each edge emanating from a node has an identity, in 
that it is either a "HI" branch or a "LO" branch, and of the two edges emanating from each 
node, there must be exactly one "HI" branch and one "LO" branch. Each node also has an 
identity, a number from 1 to n if there are n inputs to the combinatorial problem. The tree 
must contain one or two special nodes, or "sinks", one of which is the '"True" sink, and 
optionally the "False" sink. We also introduce the conventions that nodes can only point 
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to nodes of higher identity than themselves and that no two nodes can be identical in both 
their identity and their LO and HI pointers. 

Each node in a ZDD represents a choice about the variable the node identifies. If one 
begins at the top node of a ZDD, taking the HI branch represents including the variable 
represented by the node's identity in a prospective solution, and taking the LO branch 
represents not including that variable. If a LO or HI branch points to the True sink, that 
implies that a solution is reached if and only if all variables with identity greater than the 
current node identity are not included. If a LO branch points to the False sink, that implies 
that no solution is possible given the choices made previously. Interestingly, the constraints 
introduced in the paragraph previous to this one imply that a HI branch can never point 
to the False sink. 



ZDDs are best understood with examples. The first example, also illustrated in Fig.|13(a) 



is of the ZDD for the exactly- two function of three variables, in other words, the function 
that returns 1 if exactly two of its three inputs have value 1 and otherwise. It can also 
be described as the family of sets {{1, 2}, {1, 3}, {2, 3}}. Here, a dotted line denotes a LO 
branch and a normal line denotes a HI branch. Furthermore, the T and F symbols denote 
the True and False sinks, respectively, and the numbers inside each node refer to the nodes' 
identities (the variables that they represent). Our second example, shown in Fig. |13(b)[ 
represents the family of sets {0, {1}, {1, 2}, {1, 3}}. Note the absence of a False sink. Note, 
also, the fact that a node's HI and LO branches need not point to the different locations. 
A more in-depth exploration of the ZDDs concept can be found in Knuth (2009). 

Once the basic concept of ZDDs is introduced, one can use it for solving various com- 
binatorial problems, e.g. to represent a permanent as a ZDD in order to use the method. 
When we apply ZDDs to the computations of permanent, we classify each entry of the 
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matrix as either zero or nonzero. Then, we define a variable for each nonzero entry in the 
matrix. Each solution of our resulting ZDD will represent a possible permutation, mean- 
ing a set of entries in the matrix such that exactly one entry in each row and column is 
included in the set. There is a recursive algorithm, suggested in |Knuth (2009), that allows 
for efficient counting of the solutions of the ZDD. The algorithm is simple: the number of 
solutions of a ZDD rooted at a node is equal to the sum of the numbers of solutions of the 
ZDDs rooted at the HI and LO children of that node. The True sink is defined as having 
1 solution, and the False sink as having 0. Note that the number of solutions of a ZDD 
representing a matrix is equal to the permanent of the corresponding to — 1 matrix, with 
each 1 corresponding a nonzero entry. 

In order to find the permanent of matrices that are not 0-1 matrices, only a small 
modification is necessary. Instead of purely counting solutions of the ZDDs, we do a weighted 
count, where the weighted number of solutions of a ZDD rooted at a node is equal to the 
value of the corresponding matrix entry times the weighted number of solutions at the HI 
child added to the weighted number of solutions at the LO child. In other words, if we are 
considering a node n with children HI and LO whose identity corresponds to a matrix entry 
of nonzero value v, then 



WeightedCount(n) = v • WeightedCount(HI) + WeightedCount(LO). 



The WeightedCount of the root node of the ZDDs will be equal to the permanent of the 
corresponding matrix. 

This leaves the question of how to build the ZDDs from the matrix. This is done using 
Knuth's "melding" algorithm. The algorithm is somewhat complex and will not be described 
here, but it is described in detail in |Knuth| ([2009). The melding algorithm is an efficient 
and systematic method for constructing larger ZDDs out of the logical combination of 
smaller ones. The smallest ZDDs being melded together using Knuth's algorithm are ZDDs 
representing the "exactly-one" constraint for each row and column of the matrix; in other 
words, they are constraints requiring exactly one matrix entry in every row and column to 
be included in a permutation which will be a "solution" to our problem. 



Appendix H. Comparison of Ryser's formula with the ZDDs method 

As part of our experiments we compared the speed of Ryser's formula with the speed of 
the ZDDs method by counting memory accesses in each of the two algorithms in order to 
fairly compare them. We found that the values we got for memory accesses were strongly 
correlated with the actual speed of the algorithm. We found that for very dense matrices, 
Ryser's formula is faster, but for sparser matrices the ZDDs method is faster. We performed 
experiments with matrices that were 20%, 40%, and 60% sparse in order to get a good idea 
of the point where the ZDDs method starts outperforming Ryser's formula. (Naturally, 
with no pruning, Ryser's formula outperforms the ZDDs method significantly.) 



As can be seen from Fig. 14, the ZDDs method begins outperforming Ryser's formula 



when matrices are around 60% sparse. 
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Figure 14: Scatter plot of the number of memory accesses required to exactly compute the 
permanent of a sparse matrix for instances with different degree of pruning. 
Red and blue dots mark results of the Ryser formula and of the ZDD method 
respectively. 
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